NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Fish-Vista: A Multi-Purpose Dataset for Understanding & Identification of Traits from Images

Mehrab, Kazi Sajeed; Maruf, M; Daw, Arka; Neog, Abhilash; Manogaran, Harish Babu; Khurana, Mridul; Feng, Zhenyang; Altintas, Bahadir; Bakis, Yasin; Campolongo, Elizabeth; et al (June 2025, CVPR)

The availability of large datasets of organism images combined with advances in artificial intelligence (AI) has significantly enhanced the study of organisms through images, unveiling biodiversity patterns and macro-evolutionary trends. However, existing machine learning (ML)-ready organism datasets have several limitations. First, these datasets often focus on species classification only, overlooking tasks involving visual traits of organisms. Second, they lack detailed visual trait annotations, like pixel-level segmentation, that are crucial for in-depth biological studies. Third, these datasets predominantly feature organisms in their natural habitats, posing challenges for aquatic species like fish, where underwater images often suffer from poor visual clarity, obscuring critical biological traits. This gap hampers the study of aquatic biodiversity patterns which is necessary for the assessment of climate change impacts, and evolutionary research on aquatic species morphology. To address this, we introduce the Fish-Visual Trait Analysis (Fish-Vista) dataset—a large, annotated collection of about 80K fish images spanning 3000 different species, supporting several challenging and biologically relevant tasks including species classification, trait identification, and trait segmentation. These images have been curated through a sophisticated data processing pipeline applied to a cumulative set of images obtained from various museum collections. Fish-Vista ensures that visual traits of images are clearly visible, and provides fine-grained labels of various visual traits present in each image. It also offers pixel-level annotations of 9 different traits for about 7000 fish images, facilitating additional trait segmentation and localization tasks. The ultimate goal of Fish-Vista is to provide a clean, carefully curated, high-resolution dataset that can serve as a foundation for accelerating biological discoveries using advances in AI. Finally, we provide a comprehensive analysis of state-of-the-art deep learning techniques on Fish-Vista.
more » « less
Free, publicly-accessible full text available June 15, 2026
Toward a Flexible Metadata Pipeline for Fish Specimen Images

https://doi.org/10.1007/978-3-031-39141-5_15

Jebbia, Dom; Wang, Xiaojun; Bakis, Yasin; Bart, Henry L; Greenberg, Jane (January 2023, Springer Nature Switzerland)

Full Text Available
Discovering Novel Biological Traits From Images Using Phylogeny-Guided Neural Networks

https://doi.org/10.1145/3580305.3599808

Elhamod, Mohannad; Khurana, Mridul; Manogaran, Harish Babu; Uyeda, Josef C.; Balk, Meghan A.; Dahdul, Wasila; Bakis, Yasin; Bart, Henry L.; Mabee, Paula M.; Lapp, Hilmar; et al (August 2023, KDD 2023 Proceedings. 29TH ACM SIGKDD. Conference on Knowledge Discovery and Data Mining.)

Full Text Available
Automatic Metadata Generation for Fish Specimen Image Collections

https://doi.org/10.1109/JCDL52503.2021.00015

Pepper, Joel; Greenberg, Jane; Bakis, Yasin; Wang, Xiaojun; Bart, Henry; Breen, David (September 2021, The Institute of Electrical and Electronics Engineers, Inc. (IEEE) Conference Proceedings/2021 ACM/IEEE Joint Conference on Digital Libraries (JCDL))

Conference Title: 2021 ACM/IEEE Joint Conference on Digital Libraries (JCDL) Conference Start Date: 2021, Sept. 27 Conference End Date: 2021, Sept. 30 Conference Location: Champaign, IL, USAMetadata are key descriptors of research data, particularly for researchers seeking to apply machine learning (ML) to the vast collections of digitized specimens. Unfortunately, the available metadata is often sparse and, at times, erroneous. Additionally, it is prohibitively expensive to address these limitations through traditional, manual means. This paper reports on research that applies machine-driven approaches to analyzing digitized fish images and extracting various important features from them. The digitized fish specimens are being analyzed as part of the Biology Guided Neural Networks (BGNN) initiative, which is developing a novel class of artificial neural networks using phylogenies and anatomy ontologies. Automatically generated metadata is crucial for identifying the high-quality images needed for the neural network's predictive analytics. Methods that combine ML and image informatics techniques allow us to rapidly enrich the existing metadata associated with the 7,244 images from the Illinois Natural History Survey (INHS) used in our study. Results show we can accurately generate many key metadata properties relevant to the BGNN project, as well as general image quality metrics (e.g. brightness and contrast). Results also show that we can accurately generate bounding boxes and segmentation masks for fish, which are needed for subsequent machine learning analyses. The automatic process outperforms humans in terms of time and accuracy, and provides a novel solution for leveraging digitized specimens in ML. This research demonstrates the ability of computational methods to enhance the digital library services associated with the tens of thousands of digitized specimens stored in open-access repositories worldwide.
more » « less
Full Text Available
Hierarchy‐guided neural network for species classification

https://doi.org/10.1111/2041-210X.13768

Elhamod, Mohannad; Diamond, Kelly M.; Maga, A. Murat; Bakis, Yasin; Bart, Henry L.; Mabee, Paula; Dahdul, Wasila; Leipzig, Jeremy; Greenberg, Jane; Avants, Brian; et al (March 2022, Methods in Ecology and Evolution)

Full Text Available

Search for: All records